Record: GDN-Hybrid + Sliding Window Attention + compressed-code warmdown1000 - val_bpb 1.01671 (3-seed mean)#1576
Conversation
|
BPB metric bug: space bytes double-counted (inherited from closed parent PR #1545) The decompressed Bugged code in this PR (decompressed from the LZMA self-extractor): # build_sentencepiece_luts, around line 217
if piece.startswith("▁"):
has_space[i] = True
base_bytes[i] = len(piece[1:].encode("utf-8")) + 1 # +1 adds the space byteThen the eval loop adds the same space byte again: tb += (has_leading_space_lut[tgt] & ~is_boundary_token_lut[prev]).to(torch.float64)Reference implementation ( piece = sp.id_to_piece(token_id)
if piece.startswith("▁"):
has_leading_space_np[token_id] = True
piece = piece[1:] # strip ▁
base_bytes_np[token_id] = len(piece.encode("utf-8")) # NO +1 hereThe reference counts the space byte exactly once (in the eval loop, conditioned on Running the parent PR's corrected LUT on the same checkpoint lands in the ~1.16–1.18 range (per @Abhishek8108's own correction on #1545), not 1.01671. Bug was missed here because the training code is wrapped in an LZMA self-extractor, which hides it from standard review. Suggest the maintainers decompress and re-score before this shifts the leaderboard. |
Summary
val_bpb = 1.01671233 (3-seed mean, std 0.00134386)
15.71–15.90 MB
Improves the GDN-Hybrid fixed-predictor line with a warmdown1000 schedule and compressed-code packaging w/o eval-time adaptation.
Architecture / Technique Stack
Compliance
Notes
XSA telemetry is reported for completeness, but the submitted score is the fixed-model quantized_bpb result above.
Credits